Artificial Intelligence vs Clinician Performance in Estimating Probabilities of Diagnoses Before and After Testing

This diagnostic study compares the performance of artificial intelligence (AI) with that of human clinicians in estimating the probability of diagnoses before and after testing.


eAppendix 2. Best answers for testing questions
To identify the best evidence-based pretest probability, sensitivity and specificity from the literature, we used a hierarchical method.
1. Data was first sought from high-quality recent systematic reviews, meta-analyses, and/or guidelines.
2. If only older systematic reviews, meta-analyses, and/or guidelines were available with newer high-impact studies after publication, we considered data from both (attempting to understand most accurate numbers for current technology/practice) 3.If no systematic reviews, meta-analyses, and/or guidelines were available, we used data from commonly cited studies based on citations in recent guidelines and creating weighted averages by consensus.The expert panel of physicians overseeing the study was presented with best evidence identified and settled on evidence-based answers presented in results.

Question 1
Ms. Smith, a previously healthy 35-year-old woman who smokes tobacco presents with five days of fatigue, productive cough, worsening shortness of breath, fevers to 102°F and decreased breath sounds in the lower right field.She has a heart rate of 105 but otherwise vital signs are normal.She has no particular preference for testing and wants your advice.a.How likely is it that Ms. Smith has pneumonia based on this information?(pretest probability) 35-yo woman, fatigue, cough, SOB, fevers 102°F/38.9°C,tachycardia There are no systematic reviews/meta-analyses for pretest probability.The closest thing we could identify was Metlay et al. 1 Prevalence-starting pretest, 5% of all patients visiting primary care physicians for cough diagnosed as having pneumonia.Heckerling 2 references the National Health Survey; 3% of people with respiratory infections have PNA.

Prediction Rules
All prediction rules are developed compared to their ability to identify infiltrates on CXR.
Pretest probability is best determined by pneumonia prediction algorithms.There are a few: • Heckerling criteria 2 -based on the absence of asthma, temp >100°F, HR >100, decreased BS (this patient was missing one variable of crackles) she has a 25% chance of pneumonia.o Article complicated but has a nomogram.Assuming 5% prevalence of PNA in primary care, 4 RF = 25% probability of pneumonia.
• Diehr criteria 3 -sputum, temp >100 (3 points) Score of +3 = +LR 14-with same 5% prevalence = 42% We looked for more recent publications and the only one identified was Tse et al. 4 This isn't as well developed and is confusing to interpret.Making some assumptions about duration of fever >3 days, it would predict ~33% pretest probability of pneumonia, so wouldn't change the conclusion from the Heckerling & Diehr criteria.

In summary, pretest probability 25-42%
There are no systematic reviews/meta-analyses for sensitivity & specificity of a chest x-ray for a clinical diagnosis of pneumonia.The best systematic review was Ye et al. 5 , which was primarily focused on using lung ultrasound for the diagnosis of CAP.It provides sensitivity & specificity, but this is vs. hospital ICD code, which is likely very non-specific for pneumonia.
The most informative paper for determining sensitivity and specificity was Claessens et al. 6 This was a prospective cohort study performed in the ER.
• 319 patients with suspected CAP prospectively enrolled from ER.This review discussed clinical prediction rules.This was focused on the ER where the overall prevalence of acute coronary syndrome was estimated to be 13% (relatively high, and not the setting for this patient).However, we believe this is worth discussing.This patient would be categorized as LOW risk by all tools discussed, although some rules require obtaining a troponin test to categorize patients.The risk of a patient like this, considering 13% prevalence of ACS in ER would be 2.9% to 4.4%.Using prevalence of acute coronary syndrome in primary care with these prediction rules would lead to estimates of 1-2.7% pretest probability. 13r a general discussion of calculators, see DiCarli et al. 14 Some cohorts are discussed for pretest probability by UpToDate but they only include those who had angiography and therefore have higher average pretest probability.
The best performing model seems to be the CAD consortium score 15 .This is the European CAD consortium score that was shown to be better than Diamond & Forrester. 16,17For this model, apparently the best, the basic model gives a 3% risk whereas the more nuanced clinical model provides a 1% pretest probability.
Other scores reviewed: PreTest Consult score 18 was used by Victor Montori in a randomized trial; 19 it is simple and attributes 0.6% risk for this patient.
Diamond Mr. Williams, a 65-year-old man, comes to the office for follow up of his osteoarthritis.He has noted foulsmelling urine and no pain or difficulty with urination.A urine dipstick shows trace blood.He has no particular preference for testing and wants your advice.
a. How likely is Mr. Williams to have a urinary tract infection (UTI)?(Pretest Probability) Mr. Williams symptoms most compatible with asymptomatic bacteriuria, based on lack of any definite symptoms of UTI.This interpretation is consistent with Infectious Disease Society of America (IDSA) guideline for UTI 22 as well as IDSA Asymptomatic Bacteriuria (ASB) guideline. 23This was interpreted to mean his chance of UTI included 0%.However, given small risk of asymptomatic bacteriuria to develop into pyelonephritis in specific groups 24 and clinical experience of rare cases of complicated UTI with bacteremia without symptoms localizing to the urinary tract, probability was expanded to be a range form 0-1%.
Pretest probability = 0-1% The sensitivity and specificity of urine culture for diagnosis of urinary tract infection (UTI) in different patient populations was obtained from a systematic review to augment current IDSA guidelines. 25A partial table from that review is below.

Patient population Sensitivity Specificity
Healthy outpatient women

(Spec 70.1%) (all with CT as gold standard)
Claessens Y-E, Debray M-P, Tubach F, et al.Early Chest Computed Tomography Scan to Assist Diagnosis and Guide Treatment Decision for Suspected Community-acquired Pneumonia.Am J Respir Crit Care Med.

You are seeing Ms. Johnson, 45-year-old woman, for an annual visit. She has no specific risk factors or symptoms for breast cancer. She has no particular preference for testing and wants your advice.
According to the most recent ACS breast cancer statistics 8 , the 10-year probability of developing invasive breast cancer is 1.5% for women age 40-50.If we divide by year, we get 1.5/10 = 0.15% per year, <1% at any one point.
a. How likely is Ms. Johnson to have breast cancer based on this information?(Pretest Probability)

You are seeing Mrs. Jones, a 43-year-old premenopausal woman with atypical chest pain and a normal ECG. She has no risk factors and normal vital signs/examination. She has no particular preference for testing and wants your advice.
We identified a systematic review in JAMA Rational Clinical Examination series by Fanaroff et al in 2015.12 a. How likely is Mrs. Jones' to have cardiac ischemia based on this information?(Pretest Probability)

Negative likelihood ratio 0.11 b. Mr. Williams' urine culture is positive. How likely is he to have a UTI? (PPV)
Correct answer is 0-8.3%

c. Mr. Williams' urine culture is negative. How likely is he to have a UTI?
Webber EM, Bean SI.Screening for Asymptomatic Bacteriuria in Adults: Updated Evidence Report and Systematic Review for the US Preventive Services Task Force.JAMA.2019;322(12):1195-1205. doi:10.1001/jama.2019.1006025.Chan-Tack KM, Trautner BW, Morgan DJ.The varying specificity of urine cultures in different populations.= "I am running an experiment to see if you are able to estimate probabilities of medical conditions.Of note, you are NOT treating patients.This is only a study.For each condition, I want you to estimate the probability as a percentage.Do not give a range and be as precise as possible.Just give your answers, do not show your work.After you determine your answer, format the output as: [PERCENT_1, PERCENT_2, PERCENT_3] each representing your respective percent answer to the three questions.Do not include any other text in your response."q1 = "Ms.Smith, a previously healthy 35-year-old woman who smokes tobacco presents with five days of fatigue, productive cough, worsening shortness of breath, fevers to 102F and decreased breath sounds in the lower right field.She has a heart rate of 105 but otherwise vital signs are normal.She has no particular preference for testing and wants your advice.How likely is it that Ms. Smith has pneumonia based on this information?_______% Ms. Smith's chest X-ray is consistent with pneumonia.How likely is she to have pneumonia?_______% Ms. Smith's chest X-ray is negative.How likely is she to have pneumonia?_______%" q2 = "You are seeing Ms. Johnson, 45-year-old woman, for an annual visit.She has no specific risk factors or symptoms for breast cancer.She has no particular preference for testing and wants your advice.How likely is Ms. Johnson to have breast cancer based on this information?_______% Ms. Johnson's mammogram is positive.How likely is she to have breast cancer?_______% Ms. Johnson's mammogram is negative.How likely is she to have breast cancer?_______%" q3 = "You are seeing Mrs.Jones, a 43-year-old premenopausal woman with atypical chest pain and a normal ECG.She has no risk factors and normal vital signs/examination.She has no particular preference for testing and wants your advice.How likely is Mrs. Jones' to have cardiac ischemia based on this information?_______% Mrs. Jones' exercise stress test is positive.How likely is she to have cardiac ischemia?_______% Mrs. Jones' exercise stress test is negative.How likely is she to have cardiac ischemia?_______%" q4 = "Mr.Williams, a 65-year-old man, comes to the office for follow up of his osteoarthritis.He has noted foul-smelling urine and no pain or difficulty with urination.A urine dipstick shows trace blood.He has no particular preference for testing and wants your advice.How likely is Mr. Williams to have a urinary tract infection (UTI)?_______% Mr. Williams' urine culture is positive.How likely is he to have a UTI? _______% Mr. Williams' urine culture is negative.How likely is he to have a UTI? _______%" Infect Control Hosp Epidemiol.Published online February 10, 2020:1-2.doi:10.1017/ice.2020.16eAppendix 3. GPT-4 prompts prefix